This notebooks will help me talk through a bit of material regarding spatial data, with some introduction to spatial joins.

Key topics for today:

Packages

Standards:

library(knitr)
library(tidyverse)
library(janitor)
library(lubridate) # because we will probably see some dates
library(here) # a package I haven't taught you about before that doesn't do much, but ....
Warning: package ‘here’ was built under R version 4.2.1
here() starts at U:/DS 241/ds241_f22/analysis/work

Some additional packages focused on today’s work:

library(sf) # working with simple features - geospatial
Warning: package ‘sf’ was built under R version 4.2.2
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tmap)
Warning: package ‘tmap’ was built under R version 4.2.2
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
library(tidycensus)
Warning: package ‘tidycensus’ was built under R version 4.2.2

Informational resources

Using the Neighborhood Geospatial Data (using /data)

Our first data source comes from opendata.dc

https://opendata.dc.gov/datasets/DCGIS::dc-health-planning-neighborhoods/about

I will use the GeoJSON file. (Newer, not necessarily better, but … a single file. Not smaller, but … this one is not big.)

Data is easily readable

neigh=st_read(here("DC_Health_Planning_Neighborhoods.geojson")) %>% clean_names()
Reading layer `DC_Health_Planning_Neighborhoods' from data source 
  `U:\DS 241\ds241_f22\analysis\work\DC_Health_Planning_Neighborhoods.geojson' using driver `GeoJSON'
Simple feature collection with 51 features and 8 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -77.11976 ymin: 38.79165 xmax: -76.9094 ymax: 38.99556
Geodetic CRS:  WGS 84
class(neigh)
[1] "sf"         "data.frame"
plot(neigh)

Reminder - Joins

df1=tibble(fruit=c("apple","banana","cherry"),cost=c(1.5,1.2,2.25))
df2=tibble(fruit=c("apple","apple","cherry","lemon"),
           desert=c("pie","cobbler","cobbler","cheesecake"),
           cal=c(400,430,500,550))
df1
df2
left_join(df1,df2,by="fruit")

Investigating joining spatial and non-spatial data

Covid case information is available from opendatadc:

https://opendata.dc.gov/datasets/DCGIS::dc-covid-19-total-positive-cases-by-neighborhood/about

Read cases information:

df_c=read_csv(here("DC_COVID-19_Total_Positive_Cases_by_Neighborhood.csv")) %>% clean_names() 
Rows: 26372 Columns: 4
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): DATE_REPORTED, NEIGHBORHOOD
dbl (2): OBJECTID, TOTAL_POSITIVES

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_cases=df_c %>%
  filter(as_date(date_reported) == "2022-02-22") %>% 
  separate(neighborhood,into=c("code","name"),sep = ":") %>%
  mutate(code=case_when(
    code=="N35" ~"N0",
    TRUE ~ code
  )) %>%
  select(-objectid,-date_reported)

Regular joining (of dataframes)

neigh2=left_join(neigh,df_cases,by=c("code")) 

tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(neigh2) +tm_polygons("total_positives",alpha=.5)

Joining with other spatial data

Let’s get some data using tidycensus. Need an API key https://api.census.gov/data/key_signup.html



census_api_key("00b78ce52463bf386a260d23ec58edb622e6d3ac")
To install your API key for use in future sessions, run this function with `install = TRUE`.
#what variables
v20 = load_variables(2018,"acs5")
# median_family_income="    B06011_001" 
# all "B00001_001"  
#black "B02009_001"

Get some data:

df_cencus=get_acs(geography = "tract",
                  variables=c("median_inc"="B06011_001",
                              "pop"="B01001_001",
                              "pop_black"="B02009_001"),
                  state="DC",geometry=TRUE,year=2018) 
Getting data from the 2014-2018 5-year ACS
Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
Using FIPS code '11' for state 'DC'

  |                                                                                                             
  |                                                                                                       |   0%
  |                                                                                                             
  |==========================================                                                             |  41%
  |                                                                                                             
  |=================================================================                                      |  63%
  |                                                                                                             
  |=======================================================================================================| 100%
class(df_cencus)
[1] "sf"         "data.frame"
plot(df_cencus)

It’s in long format. Let’s make it wide.

df_cens=df_cencus %>% select(-moe) %>% spread(variable,estimate) 

tm_shape(df_cens) +tm_polygons("median_inc",alpha=.5)

  tm_shape(neigh2) +tm_borders(col="blue",lwd=5,alpha=.2)+
  tm_shape(df_cens) +tm_borders(col="red",lwd=1,alpha=.3)
#<<<<<<< HEAD
#df_j=st_join(df_cens,neigh2)
#=======
#df_j=st_join(df_cens,neigh2,prepared=FALSE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2
df_cens_adj=df_cens %>% st_transform(4326)
df_j=st_join(df_cens_adj,neigh2,largest=TRUE)
Warning: attribute variables are assumed to be spatially constant throughout all geometries

Other order?:

#<<<<<<< HEAD
##df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#=======
#df_j_rev = st_join(neigh2,df_cens_adj,largest=TRUE)
#>>>>>>> aaf01be5cf721819dd2df615aef7a1999bcec0c2

Since we want the geometry for the NEIGHBORHOODS, we need a to work a little harder:

df1=df_j %>% select(median_inc,pop,pop_black,objectid) %>%
  group_by(objectid) %>%
  summarise(pop_n=sum(pop),
            pop_black_n=sum(pop_black), 
            adj_median_income=sum(pop*median_inc)/pop_n) 

plot(df1)

#df2=left_join(neigh2,df1)

df2=left_join(neigh2,df1 %>% st_set_geometry(NULL))
Joining, by = "objectid"
df2=df2 %>% mutate(black_perc=pop_black_n/pop_n, covid_rate=total_positives/pop_n)
tm_shape(df2)+tm_polygons(c("adj_median_income","covid_rate","black_perc"))
df2 %>% filter(objectid!=30) %>% tm_shape()+tm_polygons(c("adj_median_income","covid_rate","black_perc"),alpha=.4)
LS0tDQp0aXRsZTogIlJlcGxhY2VtZW50IENsYXNzIC0gQ2xlYW5pbmcgdXAgYW5kIFNwYXRpYWwgSm9pbnMiDQpkYXRlOiAgIjIwMjItMTEtMDkiDQphdXRob3I6ICJTYXJhaCBLb2hscyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNClRoaXMgbm90ZWJvb2tzIHdpbGwgaGVscCBtZSB0YWxrIHRocm91Z2ggYSBiaXQgb2YgbWF0ZXJpYWwgcmVnYXJkaW5nICpzcGF0aWFsIGRhdGEqLCANCndpdGggc29tZSBpbnRyb2R1Y3Rpb24gdG8gYHNwYXRpYWwgam9pbnNgLg0KDQpLZXkgdG9waWNzIGZvciB0b2RheToNCg0KKiBVc2luZyBgc2ZgIHBhY2thZ2UNCiogVXNpbmcgYHRtYXBgIHBhY2thZ2UNCiogVXNpbmcgYHRpZHljZW5zdXNgIHBhY2thZ2UNCiogUmVtaW5kZXIgb24gam9pbnMgKGZvY3VzOiBsZWZ0IGpvaW4pDQoqIE91ciBTcGF0aWFsIERhdGENCiAgICogTmVpZ2hib3Job29kcw0KICAgKiBKb2luaW5nIHdpdGggbm9uLXNwYXRpYWwgZGF0YQ0KICAgKiBDZW5zdXMgZGF0YQ0KICAgKiBKb2luaW5nIHdpdGggc3BhdGlhbCBkYXRhDQoqIGlnbm9yZSBodG1sIGluIGdpdCAgKGlmIHRpbWUpDQoNCiMjIFBhY2thZ2VzDQoNClN0YW5kYXJkczoNCg0KYGBge3J9DQpsaWJyYXJ5KGtuaXRyKQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KGphbml0b3IpDQpsaWJyYXJ5KGx1YnJpZGF0ZSkgIyBiZWNhdXNlIHdlIHdpbGwgcHJvYmFibHkgc2VlIHNvbWUgZGF0ZXMNCmxpYnJhcnkoaGVyZSkgIyBhIHBhY2thZ2UgSSBoYXZlbid0IHRhdWdodCB5b3UgYWJvdXQgYmVmb3JlIHRoYXQgZG9lc24ndCBkbyBtdWNoLCBidXQgLi4uLg0KYGBgDQoNClNvbWUgYWRkaXRpb25hbCBwYWNrYWdlcyBmb2N1c2VkIG9uIHRvZGF5J3Mgd29yazoNCg0KYGBge3J9DQpsaWJyYXJ5KHNmKSAjIHdvcmtpbmcgd2l0aCBzaW1wbGUgZmVhdHVyZXMgLSBnZW9zcGF0aWFsDQpsaWJyYXJ5KHRtYXApDQpsaWJyYXJ5KHRpZHljZW5zdXMpDQoNCmBgYA0KIyMgSW5mb3JtYXRpb25hbCByZXNvdXJjZXMNCg0KKiBBbiBvdmVyYWxsIHJlc291cmNlIG9uIG1hcHBpbmcgaW4gUjogaHR0cHM6Ly9ib29rZG93bi5vcmcvbmljb2hhaG4vbWFraW5nX21hcHNfd2l0aF9yNS9kb2NzL2ludHJvZHVjdGlvbi5odG1sDQoqIEEgc3RhcnRpbmcgcG9pbnQgdG8gbGVhcm4gYWJvdXQgYHNmYDogIGh0dHBzOi8vci1zcGF0aWFsLmdpdGh1Yi5pby9zZi9hcnRpY2xlcy8NCiogR2V0dGluZyBzdGFydGVkIHdpdGggYHRtYXBgOiBodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvdG1hcC92aWduZXR0ZXMvdG1hcC1nZXRzdGFydGVkLmh0bWwNCiogVGhlIGB0aWR5Y2Vuc3VzYCBwYWNrYWdlOiBodHRwczovL3dhbGtlci1kYXRhLmNvbS90aWR5Y2Vuc3VzL2luZGV4Lmh0bWwNCiogVGhlIGJvb2sgb24gYHRpZHljZW5zdXNgIDogaHR0cHM6Ly93YWxrZXItZGF0YS5jb20vY2Vuc3VzLXIvaW5kZXguaHRtbA0KDQoNCiMjIFVzaW5nIHRoZSBOZWlnaGJvcmhvb2QgR2Vvc3BhdGlhbCBEYXRhICh1c2luZyAvZGF0YSkNCg0KDQpPdXIgZmlyc3QgZGF0YSBzb3VyY2UgY29tZXMgZnJvbSBvcGVuZGF0YS5kYw0KDQpodHRwczovL29wZW5kYXRhLmRjLmdvdi9kYXRhc2V0cy9EQ0dJUzo6ZGMtaGVhbHRoLXBsYW5uaW5nLW5laWdoYm9yaG9vZHMvYWJvdXQNCg0KDQpJIHdpbGwgdXNlIHRoZSBHZW9KU09OIGZpbGUuICAoTmV3ZXIsIG5vdCBuZWNlc3NhcmlseSBiZXR0ZXIsIGJ1dCAuLi4gYSBzaW5nbGUgZmlsZS4gIE5vdCBzbWFsbGVyLCBidXQgLi4uIHRoaXMgb25lIGlzIG5vdCBiaWcuKSAgDQoNCg0KDQoNCkRhdGEgaXMgZWFzaWx5IHJlYWRhYmxlIA0KYGBge3J9DQpuZWlnaD1zdF9yZWFkKGhlcmUoIkRDX0hlYWx0aF9QbGFubmluZ19OZWlnaGJvcmhvb2RzLmdlb2pzb24iKSkgJT4lIGNsZWFuX25hbWVzKCkNCmNsYXNzKG5laWdoKQ0KYGBgDQoNCmBgYHtyfQ0KcGxvdChuZWlnaCkNCmBgYA0KDQoNCg0KIyMgUmVtaW5kZXIgLSBKb2lucw0KDQpgYGB7cn0NCmRmMT10aWJibGUoZnJ1aXQ9YygiYXBwbGUiLCJiYW5hbmEiLCJjaGVycnkiKSxjb3N0PWMoMS41LDEuMiwyLjI1KSkNCmRmMj10aWJibGUoZnJ1aXQ9YygiYXBwbGUiLCJhcHBsZSIsImNoZXJyeSIsImxlbW9uIiksDQogICAgICAgICAgIGRlc2VydD1jKCJwaWUiLCJjb2JibGVyIiwiY29iYmxlciIsImNoZWVzZWNha2UiKSwNCiAgICAgICAgICAgY2FsPWMoNDAwLDQzMCw1MDAsNTUwKSkNCmRmMQ0KYGBgDQpgYGB7cn0NCmRmMg0KYGBgDQpgYGB7cn0NCmxlZnRfam9pbihkZjEsZGYyLGJ5PSJmcnVpdCIpDQpgYGANCg0KIyMgSW52ZXN0aWdhdGluZyBqb2luaW5nIHNwYXRpYWwgYW5kIG5vbi1zcGF0aWFsIGRhdGENCg0KDQpDb3ZpZCBjYXNlIGluZm9ybWF0aW9uIGlzIGF2YWlsYWJsZSBmcm9tIG9wZW5kYXRhZGM6DQoNCmh0dHBzOi8vb3BlbmRhdGEuZGMuZ292L2RhdGFzZXRzL0RDR0lTOjpkYy1jb3ZpZC0xOS10b3RhbC1wb3NpdGl2ZS1jYXNlcy1ieS1uZWlnaGJvcmhvb2QvYWJvdXQNCg0KUmVhZCBjYXNlcyBpbmZvcm1hdGlvbjoNCg0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCmRmX2M9cmVhZF9jc3YoaGVyZSgiRENfQ09WSUQtMTlfVG90YWxfUG9zaXRpdmVfQ2FzZXNfYnlfTmVpZ2hib3Job29kLmNzdiIpKSAlPiUgY2xlYW5fbmFtZXMoKSANCg0KZGZfY2FzZXM9ZGZfYyAlPiUNCiAgZmlsdGVyKGFzX2RhdGUoZGF0ZV9yZXBvcnRlZCkgPT0gIjIwMjItMDItMjIiKSAlPiUgDQogIHNlcGFyYXRlKG5laWdoYm9yaG9vZCxpbnRvPWMoImNvZGUiLCJuYW1lIiksc2VwID0gIjoiKSAlPiUNCiAgbXV0YXRlKGNvZGU9Y2FzZV93aGVuKA0KICAgIGNvZGU9PSJOMzUiIH4iTjAiLA0KICAgIFRSVUUgfiBjb2RlDQogICkpICU+JQ0KICBzZWxlY3QoLW9iamVjdGlkLC1kYXRlX3JlcG9ydGVkKQ0KDQoNCmBgYA0KDQoNCiMjIFJlZ3VsYXIgam9pbmluZyAob2YgZGF0YWZyYW1lcykNCg0KYGBge3J9DQpuZWlnaDI9bGVmdF9qb2luKG5laWdoLGRmX2Nhc2VzLGJ5PWMoImNvZGUiKSkgDQoNCnRtYXBfbW9kZSgidmlldyIpDQoNCnRtX3NoYXBlKG5laWdoMikgK3RtX3BvbHlnb25zKCJ0b3RhbF9wb3NpdGl2ZXMiLGFscGhhPS41KQ0KYGBgDQoNCg0KIyMgSm9pbmluZyB3aXRoIG90aGVyIHNwYXRpYWwgZGF0YQ0KDQpMZXQncyBnZXQgc29tZSBkYXRhIHVzaW5nIGB0aWR5Y2Vuc3VzYC4gIE5lZWQgYW4gQVBJIGtleSAgIGh0dHBzOi8vYXBpLmNlbnN1cy5nb3YvZGF0YS9rZXlfc2lnbnVwLmh0bWwNCg0KDQpgYGB7cn0NCg0KDQpjZW5zdXNfYXBpX2tleSgiMDBiNzhjZTUyNDYzYmYzODZhMjYwZDIzZWM1OGVkYjYyMmU2ZDNhYyIpDQoNCiN3aGF0IHZhcmlhYmxlcw0KdjIwID0gbG9hZF92YXJpYWJsZXMoMjAxOCwiYWNzNSIpDQojIG1lZGlhbl9mYW1pbHlfaW5jb21lPSIJQjA2MDExXzAwMSIgDQojIGFsbCAiQjAwMDAxXzAwMSIJDQojYmxhY2sgIkIwMjAwOV8wMDEiDQpgYGANCg0KDQpHZXQgc29tZSBkYXRhOg0KDQpgYGB7cn0NCmRmX2NlbmN1cz1nZXRfYWNzKGdlb2dyYXBoeSA9ICJ0cmFjdCIsDQogICAgICAgICAgICAgICAgICB2YXJpYWJsZXM9YygibWVkaWFuX2luYyI9IkIwNjAxMV8wMDEiLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInBvcCI9IkIwMTAwMV8wMDEiLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInBvcF9ibGFjayI9IkIwMjAwOV8wMDEiKSwNCiAgICAgICAgICAgICAgICAgIHN0YXRlPSJEQyIsZ2VvbWV0cnk9VFJVRSx5ZWFyPTIwMTgpIA0KYGBgDQoNCmBgYHtyfQ0KY2xhc3MoZGZfY2VuY3VzKQ0KcGxvdChkZl9jZW5jdXMpDQpgYGANCkl0J3MgaW4gbG9uZyBmb3JtYXQuICBMZXQncyBtYWtlIGl0IHdpZGUuDQpgYGB7cn0NCmRmX2NlbnM9ZGZfY2VuY3VzICU+JSBzZWxlY3QoLW1vZSkgJT4lIHNwcmVhZCh2YXJpYWJsZSxlc3RpbWF0ZSkgDQoNCnRtX3NoYXBlKGRmX2NlbnMpICt0bV9wb2x5Z29ucygibWVkaWFuX2luYyIsYWxwaGE9LjUpDQpgYGANCg0KDQpgYGB7cn0NCg0KICB0bV9zaGFwZShuZWlnaDIpICt0bV9ib3JkZXJzKGNvbD0iYmx1ZSIsbHdkPTUsYWxwaGE9LjIpKw0KICB0bV9zaGFwZShkZl9jZW5zKSArdG1fYm9yZGVycyhjb2w9InJlZCIsbHdkPTEsYWxwaGE9LjMpDQpgYGANCg0KDQoNCmBgYHtyfQ0KIzw8PDw8PDwgSEVBRA0KI2RmX2o9c3Rfam9pbihkZl9jZW5zLG5laWdoMikNCiM9PT09PT09DQojZGZfaj1zdF9qb2luKGRmX2NlbnMsbmVpZ2gyLHByZXBhcmVkPUZBTFNFKQ0KIz4+Pj4+Pj4gYWFmMDFiZTVjZjcyMTgxOWRkMmRmNjE1YWVmN2ExOTk5YmNlYzBjMg0KYGBgDQoNCmBgYHtyfQ0KZGZfY2Vuc19hZGo9ZGZfY2VucyAlPiUgc3RfdHJhbnNmb3JtKDQzMjYpDQpgYGANCg0KYGBge3J9DQpkZl9qPXN0X2pvaW4oZGZfY2Vuc19hZGosbmVpZ2gyLGxhcmdlc3Q9VFJVRSkNCmBgYA0KT3RoZXIgb3JkZXI/Og0KDQpgYGB7cn0NCiM8PDw8PDw8IEhFQUQNCiMjZGZfal9yZXYgPSBzdF9qb2luKG5laWdoMixkZl9jZW5zX2FkaixsYXJnZXN0PVRSVUUpDQojPT09PT09PQ0KI2RmX2pfcmV2ID0gc3Rfam9pbihuZWlnaDIsZGZfY2Vuc19hZGosbGFyZ2VzdD1UUlVFKQ0KIz4+Pj4+Pj4gYWFmMDFiZTVjZjcyMTgxOWRkMmRmNjE1YWVmN2ExOTk5YmNlYzBjMg0KYGBgDQoNClNpbmNlIHdlIHdhbnQgdGhlIGdlb21ldHJ5IGZvciB0aGUgTkVJR0hCT1JIT09EUywgd2UgbmVlZCBhIHRvIHdvcmsgYSBsaXR0bGUgaGFyZGVyOg0KDQpgYGB7cn0NCmRmMT1kZl9qICU+JSBzZWxlY3QobWVkaWFuX2luYyxwb3AscG9wX2JsYWNrLG9iamVjdGlkKSAlPiUNCiAgZ3JvdXBfYnkob2JqZWN0aWQpICU+JQ0KICBzdW1tYXJpc2UocG9wX249c3VtKHBvcCksDQogICAgICAgICAgICBwb3BfYmxhY2tfbj1zdW0ocG9wX2JsYWNrKSwgDQogICAgICAgICAgICBhZGpfbWVkaWFuX2luY29tZT1zdW0ocG9wKm1lZGlhbl9pbmMpL3BvcF9uKSANCg0KcGxvdChkZjEpDQpgYGANCg0KYGBge3J9DQojZGYyPWxlZnRfam9pbihuZWlnaDIsZGYxKQ0KDQpkZjI9bGVmdF9qb2luKG5laWdoMixkZjEgJT4lIHN0X3NldF9nZW9tZXRyeShOVUxMKSkNCg0KYGBgDQoNCmBgYHtyfQ0KZGYyPWRmMiAlPiUgbXV0YXRlKGJsYWNrX3BlcmM9cG9wX2JsYWNrX24vcG9wX24sIGNvdmlkX3JhdGU9dG90YWxfcG9zaXRpdmVzL3BvcF9uKQ0KdG1fc2hhcGUoZGYyKSt0bV9wb2x5Z29ucyhjKCJhZGpfbWVkaWFuX2luY29tZSIsImNvdmlkX3JhdGUiLCJibGFja19wZXJjIikpDQpgYGANCg0KDQoNCmBgYHtyfQ0KZGYyICU+JSBmaWx0ZXIob2JqZWN0aWQhPTMwKSAlPiUgdG1fc2hhcGUoKSt0bV9wb2x5Z29ucyhjKCJhZGpfbWVkaWFuX2luY29tZSIsImNvdmlkX3JhdGUiLCJibGFja19wZXJjIiksYWxwaGE9LjQpDQpgYGANCg0KDQo=